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Abstract. It has recently been shown that the marginalization paradox (MP) can be resolved by 
interpreting improper inferences as probability limits. The key to the resolution is that probability 
limits need not satisfy the formal Bayes' law, which is used in the MP to deduce an inconsistency. 
In this paper, I explore the differences between probability limits and the more familiar pointwise 
limits, which do imply the formal Bayes' law, and show how these differences underlie some key 
differences in the interpretation of the MP. 
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INTRODUCTION 

The marginalization paradox (MP) is an apparent inconsistency in Bayesian inference 
that can arise from the use of improper priors. It was discovered in 1972 by Dawid and 
Stone [2]; together with Zidek, they published a comprehensive analysis in 1973 [3]. We 
follow Jaynes in referring to these authors as "DSZ." 

The MP arises in problems with a particular structure, where there are two different 
ways of computing the same marginal posterior. When improper priors are used, the two 
results are usually incompatible; an inconsistency cannot arise with proper priors. The 
improper inferences are computed as "formal posteriors" using the usual Bayes' law, 
k{9\x) p{x\9) Tt(9), with an improper prior. 

The MP is important to objective Bayes because the "noninformative" priors required 
by the theory are typically improper. By suggesting that improper priors cannot be used 
consistently, the MP raises questions as to whether noninformative priors exist, and thus 
as to whether the objective Bayesian approach is tenable. We use the term objective 
Bayes to describe the approach of Harold Jeffreys and Edwin Jaynes, in which a certain 
state of information is represented by a unique prior. (Their approach is also known as 
"logical Bayes"; the term objective Bayes is now often used for a different but related 
approach in which the prior may depend on the estimand [4]; for a discussion of different 
approaches, cf. [5].) 

Edwin Jaynes strongly contested the view that the MP represented a true inconsis- 
tency, and engaged DSZ in a spirited debate; cf. [6] and references therein. The battle 
lines were drawn in the 1970s, and have changed little since. Neither side managed to 
convince the other, and the absence of new results has caused the paradox to be largely 



set aside, even though it has never been completely understood. 

Recently, I have shown [1] that the MP can be resolved if probability limits, rather 
than formal posteriors, are used to define improper inferences. The purpose of this paper 
is to reconsider the differences between Jaynes and DSZ in the light of this new result. 
One might assume, in reviewing their debates, that Jaynes and DSZ were separated by 
an unbridgeable chasm. It is shown, by contrast, that the differences between Jaynes and 
DSZ hinge on a single assumption, which might at first appear as a mere technicality. 
This assumption is the type of limit used to define the improper inference. We then make 
the case that the limit process used to support the use of formal posteriors is unsound. 
Our analysis relies heavily on the ideas of Mervyn Stone, who first introduced the notion 
of probability limit, and has long maintained that the paradoxes associated with formal 
posteriors are due to the inadequacy of the pointwise limit. 

We also discuss the impact of these recent findings on the prospects for objective 
Bayes. One might have hoped that a resolution of the paradox might open the door to 
a refined theory of improper inference, using probability limits instead of formal poste- 
riors, which would provide a consistent foundation for objective Bayes. Unfortunately, 
there is strong evidence that probability limits rarely exist. Although we can identify 
some problems that we can now solve which previously led to inconsistency, it appears 
that probability limits do not exist for most of the problems previously leading to incon- 
sistencies. Thus, the MP remains a serious challenge to objective Bayes. 

THE MARGINALIZATION PARADOX 

We briefly describe the key elements of the MP. Let p(x\6) be the density function for 
some statistical model, Jt(0) a possibly improper prior, and n(0\x) the corresponding 
posterior, as computed formally from Bayes' law. As noted above, if 7c(6) is improper, 
i.e., if its integral is infinite, we call n(6\x) a formal posterior. Now suppose that 
x = (y,z) and 6 = (t], Q, that the marginal density p(z\6) depends on only through 
£, and that the marginal posterior Jt(Qx) depends on x only through z. We denote these 
functions by p(z\Q and fc(£\z), respectively. 
Intuition would now suggest that 

*(C|z)«# z |C)*(C), 

for some function 7t(Q. That is, the marginalized quantities should satisfy a (possibly 
formal) Bayes' law. In general, however, we find that they do not. The problems in 
which the inconsistencies arise are ordinary problems of statistical inference, although 
they need to satisfy certain symmetry properties. However, they are not pathological or 
contrived in any way. 

A schematic of the MP is presented in Figure 1. The paradox is sometimes dramatized 
by ascribing the different computations to two Bayesians, B\ and B2; the routes they take 
are indicated in the Figure. For numerous examples, we refer the reader to [2, 3]. 
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FIGURE 1. Schematic representation of marginalization paradox. 



SIGNIFICANCE OF THE MP 



Before proceeding to our analysis, we discuss in greater detail the significance of the MP 
for objective Bayes. One of the perceived weaknesses in the Bayesian position has been 
the subjectivity inherent in using "personal opinion" to define the prior distribution. 
Objective Bayes addresses this issue by maintaining that prior distributions can be 
objectively associated with specific states of information. Inferences are only subjective 
in the sense that different subjects have different states of information. 

Objective Bayes maintains, in particular, that there is a unique numerical represen- 
tation of "complete ignorance." The operational definition of "complete ignorance" is 
that our prior beliefs are invariant under certain transformations of the parameter space. 
For example, if we are ignorant of the scale of the variance, our belief that a lies in an 
interval Act is the same as our belief that it lies in lOAa. These transformations form a 
mathematical group. It is here that the connection with the MP becomes apparent. If the 
symmetry group is noncompact, then any group-invariant (Haar) measure is improper. 

Many of the symmetries of interest in statistics are described by noncompact groups. 
For example, translational symmetry on R, the real numbers, is expressed by the non- 
compact translation group, whose invariant measure is Lebesgue measure, with infinite 
total measure. Similarly, scale invariance is described by the noncompact group of pos- 
itive reals. The invariant measure is do/o, which diverges. Multivariate analysis typi- 
cally involves the general linear group, which is also noncompact. 

Ignorance priors are often used directly, but they also form the basis for the method 
of maximum entropy. Here, entropy is really "relative entropy," which is defined relative 
to some base measure 



In the absence of constraints representing additional information, maximum entropy is 
achieved for n(6) = 7to(0), which must thus represent a state of complete ignorance. If 
we cannot define a meaningful ignorance prior, we cannot regard maximum entropy as 
an "objective" procedure. 




Jaynes' clearly recognized the challenge the MP posed for his approach, and opposed 
it vociferously. A statement of his views can be found in Chapter 15 of [6]. "[The 
MP]. . .seemed to threaten the consistency of all probability theory." "It has been able 
to do far more damage to the cause of scientific inference than any other [paradox]." 
"Scientific inference thus suffered a setback from which it will take decades to recover." 

Jaynes claims to identify several errors in the analysis of DSZ, but his basic criticism is 
that the analysis uses, at critical junctures, intuitive reasoning about improper quantities, 
and that the correct result can only be obtained by carefully considering limits of proper 
quantities, although this analysis does not appear to have been carried out. He claims 
that a correct analysis will reveal that B\ and B2 have used different prior information, 
so it is only to be expected that their answers will differ. 

We will not attempt to address the specific arguments contained in [6] and elsewhere. 
Instead, we start from a point of agreement between Jaynes and DSZ, and show that 
the whole interpretation of the paradox hinges on how the specifics of that point are 
interpreted. 

LIMIT CONCEPTS 

The point of agreement is this: both Jaynes and DSZ agree that "infinite" quantities must 
be interpreted as limits of finite quantities. This is the main motif of Jaynes' Chapter 15: 
"The paradoxes of probability theory." It is expressed, for example, in the statement that 
"An improper pdf has meaning only as the limit of a well-defined sequence of proper 
pdfs." [6, p. 487]. DSZ are in complete agreement, and say so explicitly: "'Infinity' finds 
practical justification only when it can be interpreted as an idealized approximation of 
the finite." [7, p. 4] 

In particular, both Jaynes and DSZ agree that if 7t(6) is an improper prior, then in 
order to define the corresponding posterior, 7t(9\x), we need to construct a sequence of 
proper priors, {n n (0)}, such that n n (6) — > Jt(6) in some sense, and define k(6\x) as the 
limit 

7z(0\x) = lim7z; n (0|jc), 

n 

where each iz n ( 6 \x) is the ordinary posterior corresponding to 7l(0 \x) . There is, however, 
more than one way to define a limit! 

Jaynes follows Jeffreys in adopting what may be called pointwise limits. We say that 
7Zpt(0\x) is a pointwise limit for p(x\6) and {7t n (6) } if, for each x, 

J\n n (0\x)-n pt (e\x)\de^O. (1) 

Jaynes adopts this definition explicitly on p. 471 of [6]. (He does not specify the sense in 
which the measures 7t n (6 \x) d6 must converge to the measure n p i(0 \x) dO, although this 
is inessential; in (1), we have used convergence in total variation norm.) Jeffreys also 
adopts this definition, although always implicitly. Thus, for example, Jeffreys writes 
that "If in an actual series of observations the standard deviation is much more than 
the smallest admissible value of o, and much less than the largest, the truncation of the 
distribution makes a negligible change in the results" [8, p. 121] (italics mine). A similar 



example is worked out explicitly in [9, p. 68], where Jeffreys calculates a posterior as 
the limit of k„(6\x) with x fixed (in our notation). 

Pointwise limits lead to the formal posterior, and both Jaynes and Jeffreys justify 
their use of formal posteriors on this basis. Typically this is done for specific examples, 
but in fact, the argument can be used to justify the formal posterior for essentially any 
reasonable prior. Cf. [10] for a general proof under weak assumptions. 

An alternative notion of limit is due to M. Stone [11, 12, 13]. We say that K wo \,(0\x) 
is a probability limit for p{x\9) and {n n (6)} if 

m n (x)dx — > 0, (2) 

where m n (x) is the marginal data density, / p(x\6) K n (9)d9. Thus, pointwise limits re- 
quire that, for any fixed x, the bracketed expression eventually becomes small; proba- 
bility limits require that the average of this expression over m n (x) eventually becomes 
small. The intuition behind this definition is that the true prior is close to one of the 
7t n (6), and 7tp m b(8\x) is an idealization. It is a useful idealization if it would usually be 
a good approximation in the region where the data is expected. 

At first glance, both definitions seem reasonable, and it may seem that the difference 
could at most be technical. In fact, the difference is profound, as will become apparent 
when we examine a particular example. 



J J \n n (e\x)-7t vmh (e\x)\de 



STONE'S EXAMPLE 



In this section, I present an example, due to Stone [14], which illustrates the difference 
between the two limit concepts, and also illustrates some disturbing properties of the 
pointwise concept. Consider a Gaussian random variable, X ~ N(Q, 1), and assume that, 
rather than using the usual uniform prior on 6, we use an exponential: 7t(6) = exp(a0). 
Although no one would use this prior in practice, it illustrates phenomena that arise with 
realistic priors in more complicated problems [14]. 

We may approximate the improper exponential prior with the sequence n n (9): 



7Z„(0) oc exp 
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Pointwise limits and probability limits both exist for this sequence, but they are different: 

%(0W°cexp[-i(e-Jc-a) 2 ], n p[oh (0\x) exp[-±(0 -*) 2 ]. (4) 

The first expression is just the formal posterior; the second result follows from Stone's 
theorem [13]. 

To understand the difference between pointwise and probability limits, consider Fig- 
ure 2, which compares the priors k(6) and K n (6). The likelihood is local, so the pos- 
teriors k(6\x) and 7t n (0\x) will be similar if the priors are nearly proportional in the 
vicinity of x. Consider the interval /„ = (—tfn, \/n). Ash^m the priors, which differ 
only by the term 6 2 /n = 0(1/ y/n) in the exponential, will converge in /„, and n n (0\x) 




— tyn y/n na 



FIGURE 2. Comparison of improper prior and approximation. 



will converge to 7t(6 \x) for each x in I n . As n — > <*>, however, the interval will expand 
to cover the entire real line. Thus, the pointwise limit of the n n (0 \x), which is the formal 
posterior, will exist. 

But is this a good reason to regard the formal posterior as the limit of the n n (6\x)7 
Consider this from the standpoint of someone whose true prior is K n {Q), and who seeks 
an idealized posterior that he can use as a good approximation. As we have just seen, 
the assertion that 7^,(0|jc) — > n pt (6\x) is based entirely on agreement in the region close 
to the origin, where the data will almost never be found! On the other hand, in the 
region where the data is expected, x ~ na, n n (6\x) differs markedly from K pt (6\x). 
(Explicit formulas are given in the next section.) In this sense, the formal posterior is 
poor approximation to 7t n (6\x), for any n. 

In the next section, we show that 7Tp r ob(0 W> which does not obey the formal Bayes' 
law, is a good approximation in the region where the data expected. 

BAYES' LAW: LOCAL BUT NOT GLOBAL 

We infer from (2) that where m n {x) is large, 

n pioh (0\x) « Ttn(e\x) oc p(0\ x )n n (0). (5) 

That is, 7tp I0 b(d\x) satisfies Bayes' law locally, and approximately. On the other hand, in 
regions where m n (x) is small, there is no need for 7r prob (0|.x;) to satisfy Bayes' law, and 
in general, it will not. 

We examine this phenomenon using Stone's example. First, we show that Bayes' law 
holds locally for ^ pro b(0|jc), as given in Eq. (4); this justifies our assertion that K vm b(9\x) 
is a probability limit for this problem. It is sufficient to show that 7r pro b(0 W ~ itn(Q\x) 
when m n {x) is large. The densities 7t n (9\x) and m n (x) are easily calculated. Let o„ = 



n/{\ + n). Then 
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7t n (6\x) <x exp 




m n (x) °c exp 



2(l+n) 



Let x = an + £. Then 



(*-*+sJr) 



2l 



7t n (0\x) oc exp 



2a* 



In the region where m n (x) is large, e = 0(v / n+T). As n — > °°, e/(n+ 1) — > and c„ — > 1, 
so ^rob(^k) ~ as was to be shown. 

It is obvious, however, that Bayes' law does not hold globally for ^ prob (0|x): 



The presence or absence of a MP is determined by the type of limit we use to define 
improper inferences. In [1], it is shown that if n(6\x) is a probability limit, then the 
marginal, 7t(Qz), is also a probability limit. The MP shows that if n(0\x) is a formal 
posterior, then the marginal, 7t(Qz), is not necessarily a formal posterior. Thus, the 
interpretation of improper inferences as probability limits is internally consistent, at least 
with respect to the MP, whereas their interpretation as formal posteriors is not. 

The use of probability limits, rather than pointwise limits, is already known to resolve 
other difficulties, such as "strong inconsistency" [14], and "incoherence." [15]. Now that 
probability limits have been shown to resolve the MP as well, we have a common ex- 
planation for most of the key difficulties of improper inference: they arise because the 
pointwise limit, and the formal posteriors to which they give rise, are fundamentally un- 
sound. Probability limits, by contrast, appear to provide a consistent theory of improper 
inference. 

The local nature of Bayes' law for probability limits is the key to understanding the 
MP. The requirement that 7t(6\x) be a probability limit is essentially equivalent to the 
requirement that Bayes' law hold locally, where m n (x) is large, in the sense of (5). 
The requirement that n(6\x) be a formal posterior is equivalent to the requirement 
that Bayes' law hold globally. B\ and B2 both make inferences based on the latter 
requirement. Thus, each of them make inferences that are sometimes erroneous, and 
so they frequently disagree. 

In resolving the MP, we have followed Jaynes in regarding an improper inference as 
a limit of ordinary inferences, based on proper priors. There is nothing in our procedure 
that is inconsistent with the rules of probability theory, as developed by Jaynes in 
Chapter 2 of [6]. These rules, however, do not apply to the case where the prior is 
improper, nor do they stipulate how the improper case is to be regarded as a limit. In 
particular, they do not justify the use of formal posteriors. The use of pointwise limits is 



DISCUSSION 



an additional assumption, the validity of which is open to question. Following Stone, we 
have argued that pointwise limits are unsound, and that probability limits better capture 
the intuitive meaning of convergence. 

It remains to discuss the implications of this analysis for objective Bayesianism. There 
are strong indications that the requirement that an improper inference be a probability 
limit is very restrictive. In group models, Stone has shown that the formal posterior can 
only be a probability limit if the prior is right Haar measure and the group satisfies a 
technical condition, known as amenability [13]. Eaton and Sudderth have shown that 
many of the formal posteriors of multivariate analysis are "incoherent" or strongly 
inconsistent, and thus cannot be probability limits [16]. 

Probability limits may be used to construct improper inferences for the translation 
and scale groups, and these coincide with the formal posteriors. Since the most common 
applications of improper inference involve these groups, our analysis shows why formal 
posteriors appear to work in these simple situations. When we get to more complicated 
problems, however, such as those of multivariate analysis, it appears that probability lim- 
its associated with the "ignorance priors" for the relevant symmetries, such as GL{n), do 
not exist. Thus, the use of probability limits restores the technical viability of objective 
Bayes to only a very limited domain of improper problems. 
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